Systematic exploration of error sources in pyrosequencing flowgram data

نویسندگان

  • Susanne Balzer
  • Ketil Malde
  • Inge Jonassen
چکیده

MOTIVATION 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types. RESULTS By exploring 454 raw data, we quantify to what extent different factors account for sequencing errors. In addition to the well-known homopolymer length inaccuracies, we have identified errors likely to originate from other stages of the sequencing process. We use our findings to extend the flowsim pipeline with functionalities to simulate these errors, and thus enable a more realistic simulation of 454 pyrosequencing data with flowsim. AVAILABILITY The flowsim pipeline is freely available under the General Public License from http://biohaskell.org/Applications/FlowSim. CONTACT [email protected].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Critique: ”Filtering duplicate reads from 454 pyrosequencing”

The paper describes a novel approach for filtering duplicate reads from 454 pyrosequencing data. This problem is motivated by the need of reduce sequencing errors and artifically duplicated reads in some applications such as de-novo whole genome sequencing or metagenomics. Existing solutions are often based on nucleotide sequences, while raw flowgram values, which contain additional information...

متن کامل

Using state machines to model the Ion Torrent sequencing process and to improve read error rates

MOTIVATION The importance of fast and affordable DNA sequencing methods for current day life sciences, medicine and biotechnology is hard to overstate. A major player is Ion Torrent, a pyrosequencing-like technology which produces flowgrams--sequences of incorporation values--which are converted into nucleotide sequences by a base-calling algorithm. Because of its exploitation of ubiquitous sem...

متن کامل

A Comparison of rpoB and 16S rRNA as Markers in Pyrosequencing Studies of Bacterial Diversity

BACKGROUND The 16S rRNA gene is the gold standard in molecular surveys of bacterial and archaeal diversity, but it has the disadvantages that it is often multiple-copy, has little resolution below the species level and cannot be readily interpreted in an evolutionary framework. We compared the 16S rRNA marker with the single-copy, protein-coding rpoB marker by amplifying and sequencing both fro...

متن کامل

Evaluation of the total analytical error in the flame photometry method

Background and Objectives: For total analytical error, imprecision (SD) and bias, performance goals for laboratory tests have most often been developed. A total analytical error goal requires that the combination of errors from all sources (random and systematic errors) be within some acceptable limit. M...

متن کامل

A Stylistic and Proficiency-based Approach to EFL Learners’ Performance Inconsistency

Performance deficiencies and inconsistencies among SLA or FL learners can be attributed to variety of sources including both systemic (i.e., language issues) and individual variables.  Contrary to a rich background, the literature still suffers from a gap as far as delving into the issue from language proficiency and learning style is concerned. To fill the gap, this study addressed EFL learner...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 27  شماره 

صفحات  -

تاریخ انتشار 2011